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Preface 



Fast Software Encryption is a now eleven years old workshop on symmetric 
cryptography, including the design and analysis of block ciphers and stream 
ciphers as well as hash functions and message authentication codes. The FSE 
workshop was first held in Cambridge in 1993, followed by Leuven in 1994, 
Cambridge in 1996, Haifa in 1997, Paris in 1998, Rome in 1999, New York in 
2000, Yokohama in 2001, Leuven in 2002, and Lund in 2003. 

This Fast Software Encryption Workshop, FSE 2004, was held February 5-7, 
2004 in Delhi, India. The workshop was sponsored by IACR (the International 
Association for Cryptologic Research) and organized in cooperation with the 
Indian Statistical Institute, Delhi, and the Cryptology Research Society of India 
(CRSI). 

This year a total of 75 papers were submitted to FSE 2004. After a seven week 
reviewing process, 28 papers were accepted for presentation at the workshop. In 
addition, we were fortunate to have in the program two invited talks by Adi 
Shamir and David Wagner. 

During the workshop a rump section was held. Seven presentations were made 
and all the presenters were given the option of submitting their presentations 
for possible inclusion in the proceedings. Only one paper from this session was 
submitted, which, was refereed and accepted. This paper appears at the end of 
these proceedings. 

We would like to thank the following people. First Springer- Verlag for pub- 
lishing the proceedings in the Lecture Notes in Computer Science series. Next 
the submitting authors, the committee members, the external reviewers, the gen- 
eral co-chairs Subhamoy Maitra and R.L. Karandikar, and the local organizing 
committee, for their hard work. Bart Preneel for letting us use COSIC’s Webre- 
view software in the review process and Thomas Herlea for his support. We are 
indebted to Lund University, especially Thomas Johansson, Bijit Roy and Sug- 
ata Gangopadhyay for hosting the Webreview site. Additionally we would like 
to thank Partha Mukhopadhyay, Sourav Mukhopadhyay, Malapati Raja Sekhar, 
and Chandan Biswas for handling all the submissions and Madhusudan Karan 
for putting together the pre-proceedings. We would also like to thank the spon- 
sors: Infosys Technology Ltd., Honeywell Corporation and Via Technology. 

The organizing committee consisted of Sanjay Burman (CAIR, Bangalore), 
Ramendra S. Baoni (Bisecure Technologies Pvt. Ltd., Delhi), Hiranmoy Ghosh 
(Tata Infotech Ltd. Delhi), Abdul Sakib Mondal (Infosys Technologies Ltd., 
Bangalore), Arup Pal (ISI, Delhi), N.R. Pillai (SAG, Delhi), P.K.Saxena (SAG, 
Delhi), and Amitabha Sinha (ISI, Kolkata), who served as Treasurer. Thank you 
to them all. 
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Bimal Roy and Willi Meier 
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Abstract. A T-function is a mapping from n-bit words to n-bit words 
in which for each 0 < i < n bit i of the output can depend only on bits 
0, 1, . . . , * of the input. All the boolean operations and most of the nu- 
meric operations in modern processors are T-functions, and their com- 
positions are also T-functions. In earlier papers we considered ‘crazy’ 
T-functions such as /( x) = x + ( x 2 V 5), proved that they are invertible 
mappings which contain all the 2 n possible states on a single cycle for 
any word size n, and proposed to use them as primitive building blocks in 
a new class of software-oriented cryptographic schemes. The main practi- 
cal drawback of this approach is that most processors have either 32 or 64 
bit words, and thus even a maximal length cycle (of size 2 32 or 2 64 ) may 
be too short. In this paper we develop new ways to construct invertible 
T-functions on multiword states whose iteration is guaranteed to yield 
a single cycle of arbitrary length (say, 2 256 ). Such mappings can lead to 
stream ciphers whose software implementation on a standard Pentium 4 
processor can encrypt more than 5 gigabits of data per second, which is 
an order of magnitude faster than previous designs such as RC4. 



1 Introduction 

There are two basic approaches to the design of secret key cryptographic schemes, 
which we can call ‘tame’ and ‘wild’. In the tame approach we try to use only 
simple primitives (such as linear feedback shift registers) with well understood 
behaviour, and try to prove mathematical theorems about their cryptographic 
properties. Unfortunately, the clean mathematical structure of such schemes can 
also help the cryptanalyst in his attempt to find an attack which is faster than 
exhaustive search. In the wild approach we use crazy compositions of operations 
(which mix a variety of domains in a nonlinear and nonalgebraic way), hoping 
that neither the designer nor the attacker will be able to analyse the math- 
ematical behaviour of the scheme. The first approach is typically preferred in 
textbooks and toy schemes, but real world designs often use the second approach. 

In several papers published in the last few years [5, 6], we tried to bridge this 
gap by considering ‘semi-wild’ constructions which look like crazy combinations 
of boolean and arithmetic operations, but have many analyzable mathematical 
properties. In particular, we defined the class of T-functions which contains arbi- 
trary compositions of plus, minus, times, or, and, xor operations on n-bit words, 
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and showed that it is easy to analyse their invertibility and cycle structure for 
arbitrary word sizes. Such constructions can replace LFSRs and linear congru- 
ential mappings (which are vulnerable to correlation and algebraic attacks) in 
a new class of stream ciphers and pseudo random generators. 

The paper is organized in the following way. In section 2 we recall the basic 
definitions from [5] for single word mappings, and consider several ways in which 
they can be extended to the multiword case. In section 3 we extend our bit-slice 
technique to analyse the invertibility of multiword T-functions. In section 4 we 
extend our technique from [6] to analyse the cycle structure of multiword T- 
functions. Finally, in section 5 we provide experimental data on the speed of 
several possible implementations of our functions on a PC. 

2 Multiword T-Functions 

Invertible mappings with a single cycle have many cryptographic applications. 
The main context in which we study them in this paper is pseudo random gen- 
eration and stream ciphers. Modern microprocessors can directly operate on up 
to 64-bit words in a single clock cycle, and thus a univariate mapping can go 
through at most 2 64 different states before entering a cycle. In some crypto- 
graphic applications this cycle length may be too short, and in addition the 
cryptanalyst can guess a 64 bit state in a feasible computation. A common way 
to increase the size of the state and extend the period of a generator is to run 
in parallel and combine the outputs of several generators with different periods. 
The overall period is determined by the least common multiple of their individual 
periods. This works well with LFSRs, whose periods 2 ni — 1, 2 n2 — 1, . . . can be 
relatively prime, and thus the overall period can be their product. However, our 
univariate mappings have periods of 2 ni , 2™ 2 , . . . whose least common multiple 
is just 2 max ( ni ’ n2 ’-\ 

A partial solution to this problem is to cyclically use a large number of 
different state update functions, starting from a secret state and a secret index. 
For example, we can use 64-bit words and 2 16 — 1 different constants Ck to get 
a guaranteed cycle length of almost 2 80 from the following simple generator: 

Theorem 1. Consider the sequence defined by iterating 



x i+1 = Xi + (x 2 V Chi) mod 2", 
i = hi + 1 mod to, 



where each x is an n-bit word and Ck is some n-bit constant for each k = 
0, . . . , m — 1. Then the sequence of pairs ( Xi , kf) has a maximal period (of size 
m2 n ) if and only if m is odd, and for all k, [Cfc] 0 = 1 and [Cfc] 2 = 1. 

A special case of this theorem for to = 1 is that the function /(x) = x + 
(x 2 V C) is invertible with a single cycle if and only if both the least significant 
bit and the third least significant bit in C are 1, and the smallest such C is 5. 
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Unfortunately, the cyclic change of state update functions is inconvenient, 
and it cannot yield really large cycles (e.g., of 2 256 possible states). We can 
try to solve the problem by using a single high precision variable x (say, with 
256 bits) , but the multiplication of such long variables can become prohibitively 
expensive. What we would like to do is to define the mapping by operating 
separately on the various input words, without trying to interpret the result as 
a natural mathematical operation on multi-precision words. 

Let us first review the definitions from [5] in the case of univariate map- 
pings. Let x be an n-bit word. We can view a; as a vector of bits denoted by 
(Nn-i > • • • i No), where the least significant bit has number 0. In this bit nota- 
tion, the univariate function f(x) = x + 1 (mod 2") can be expressed in the 
following way: 



[/(*)] 0 


= /o(No) 


= No ® 1 


[/(*)] 1 


= /i(Ni;No) 


= Ni ©cm (No) 


[/(N] 2 


= /2(N 2 ; Ni , No) 


= N 2 © q 2 (Ni , No) 


[/(*)]»-! 


= fn- l(Nn— 1 ; N n— 2 , • • ■ 


•> No) 




= Nn-l®“n-l(Nn-2,- 


• •>No), 



where each at denotes one of the carry bits. Note that for any bit position i, 
lf( x )\i depends only on [x] i , . . . , No an d does not depend on Nn-i , • • • , Ni+i- 
We call any univariate function / which has this property a T- function (where 
‘T’ is short for triangular). Note further that each carry bit cti depends only on 
strictly earlier input bits Ni-i , • • • , No but not on [x ] . This is a special type 
of a T-function, which we call a parameter. To provide some intuition from the 
theory of linear transformations on n-dimensional spaces, we can say that T- 
functions roughly correspond to lower triangular matrices, parameters roughly 
correspond to lower triangular matrices with zeroes on the diagonal, and a T- 
function can be roughly represented as a diagonal matrix plus a parameter. 

Let us now define these notions for functions which map several input words 
into one output word. The natural extension of the notion of a T-function in 
this case is to allow bit i of the output to depend only on bits 0 to * of each 
one of the inputs. The observation which makes this notion interesting is that 
all the boolean operations and most of the arithmetic operations available on 
modern processors are T-functions. In particular, addition (‘+’), subtraction 
(‘binary — ’), negation (‘unary — ’), multiplication (V), or (‘V’), and (‘A’), ex- 
clusive or (‘©’), and complementation (‘— >’) (where the boolean operations are 
performed on all the n bits in parallel and the arithmetic operations are per- 
formed modulo 2 n ) are T-functions with one or two inputs. We call these eight 
functions primitive operations. Note that circular rotations and right shifts are 
not T-functions, but left shifts can be expressed as multiplication by a power 
of 2 and thus they are T-functions. Since the composition of T-functions is also 
a T-function, any ‘crazy’ function which contains arbitrarily many primitive 
operations is always a T-function. 
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In order to define multiword mappings / which can be iterated, we have to 
further extend the notion to functions with the same number m of input and 
output words. We can represent the multiword input as the following n x m bit 
matrix B nxm : 



/ .0 \ 


( 






\%m — 1 ) 


V 



/ No,n-l N 



Jl,n— 1 



Jra— l,n— 1 



0,1 

Jl,l 



Jm— 1,1 



No,o \ 

Ni.o 

Nm-1,0/ 



(2) 



We can now consider the columns of the bit matrix as parallel bit slices with 
no internal bit order, and say that a multiword mapping is a T-function if all 
the bits in column i of the matrix of output words can depend only on bits 
in columns 0 to i of the matrix of input words. In this interpretation it is still 
true that any composition of primitive operations is a multiword T-function, but 
some of the proven properties of univariate T-functions (e.g., that all the cycle 
lengths are powers of 2) are no longer true. 

An alternative definition of multiword T-functions is to concatenate all the 
input words into one long word, to concatenate all the output words into one 
long word, and then to use the standard univariate definition of a T-function in 
order to limit which input bits can affect which output bits. If we denote the l 
input words by x u , x v , . . ., then we define the single logical variable x by 



x = ( x u , ...,x w ) = (M n{I _„ )+(n _ 1) , • • • , [*] n(I _i) , • • ■ [*] n _i , • • • , No)- (3) 

Note that in this interpretation /( x) = (/«,/«) = ( x u + x v ,x v ) is a T- 
function, but the very similar f(x) = (/„, f v ) = (x u , x u +x v ) is not a T-function, 
and thus we cannot compose primitive operations in an arbitrary way. On the 
other hand, we can obtain many new types of T-functions in which low-order 
words can be manipulated by non-primitive operations (such as cyclic rotation) 
before we use them to compute higher order output words. 

Our actual definition of multiword T-functions combines and generalizes 
these two possible interpretations. Let x be an nl x m bit matrix (B nixm ) : 

/ [ x lo,n(!-l) + (n-l) [ x ]o,n(!-l) + l [ x ]o,n(l-l) 

[ X ]l,n(!-l) + (n-l) "" fdl.nf! — 1) + 1 [ X ll,n(!-1) 

V [ x ]m-l,n(l-l) + (n-l) [ x ] - 1) + 1 l x ]m- 1 ,n(! - 1) 

We consider it as an m x l matrix of n bits words 

%0,w 

%m— l,u. 

We concatenate the l words in each row into a single logical variable, and then 
consider the collection of the m long variables as the inputs to the T-function. 






\ %m—l,u • • ■ 
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Finally, we allow the bits in column i of the output matrix to depend only on 
bits in columns 0, . . . , i in the input matrix. 

To demonstrate this notion, consider the following mapping over 4-tuples of 
words: 

%0,u T 'El,U'Eo,v %0 ,v ® si,. 

^Xfl f* 1) S0,U © Si,. 

This is a valid T-function under our general multiword definition even though 
it contains the non-primitive right shift operation r > 1 . 



/(s) = 



3 Bit Slice Analysis and Invertibility 

The main tool we use in order to study the invertibility of T-functions is bit 
slice analysis. Its basic idea is to define the mapping from [x] i to [/(s)] ■ by ab- 
stracting out the complicated dependency on [x] 0 via the notion of param- 
eters. For example, the size of the explicit description of the mapping [/(s)] i = 
<p([x] Q , . . • , [s].) in the function f(x) = x + ( x 2 V 5) grows exponentially with i , 
but it can be written as [f{x)] i = [x] i ® a,, where on is some function of 
[x] 0 , . . . , [s] i _ 1 (that is, a parameter). By using this parametric representation 
we can easily prove the invertibility of the mapping by induction on i, since if we 
already know bits 0 to i — 1 of the input x and bit i of the output /(x), we can 
(in principle) calculate the value of on and thus derive in a unique way bit i of 
the input. Intuitively, this is the same technique we use in order to solve a trian- 
gular system of linear equations, except that in our case the explicit description 
of on can be extremely complicated, and thus we do not use this technique as 
a real inversion algorithm for /(x), but only in order to prove that this inverse 
is uniquely defined. 

The main observation in [5] was that such an abstract parametric represen- 
tation can be easily derived for any composition of primitive operations by the 
following recursive definition, in which i can be any bit position except zero: 



(5) 



To demonstrate this technique, consider our running example: [x + (x 2 V 5)] 0 = 
[x] 0 ® [x 2 V 5] „ = [x] 0 ® 1 and, for i > 0, [x + (x 2 V 5)] . = [x] f ® ([x 2 ] i V [5]J ® 

^x+(x 2 V5) [*^]j ® (([*^]i ^[ x ]q ® ^[ x ]q ® ^x 2 ) V [^]?) ® ^x+(x 2 V5) [*^]i ® 

This invertibility test can be easily generalized to the multivariate case (2). 
Let us show an example of such a construction. We start from an arbitrary non 
singular matrix which denotes a possible bit slice mapping, such as: 



N/] o 




[x 


o ^ [y] 0 


s* 1 
+ 1 © 


0 


[*! 


0 ® Mo 


< > 


0 


[x 


o 

< > 

O 


[xylr 


- 


[x 


i a M 0 ®“w 0 \y\i® a 


+ 1 


i 


[x 


i®[y\i®a x±v 


© 

X a y 
v u 


7. 


[x 


i 0 
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We can add to this linear mapping an affine part (where a, (3, and 7 are 
arbitrary parameters) and get the following bit slice structure: 

( Mi \ .( M* ® a Mi ® P \ 

VMj 1 Mi® 7- ) 

It is easy to check that for i > 0 the i-th bit slice of the following mapping 
matches (6): 



f x o\ / x 0 + (x§ A Xi) \ 

\ x i J V Xi+x%. ) 

Unfortunately, the least significant bit slice of this mapping is not invertible: 

( McA ( Mo ® Mo Mo\ 

\M 0 y v Mo® Mo- ) 

So we have to apply a little tweak to fix it: 

( x ° ^ ( x ° + Mo A M v 1) \ 

{xj V Xx + x 2 0 . ) 

The reader may get the impression that the bit slice mappings of invertible 
functions are always linear. From (5) it is easy to see that every expression which 
uses only ®, +, — and x has linear i-th bit slice, but in general this is not true. 

4 The Single Cycle Property 

A T-function has the single cycle property if its repeated application to any 
initial state goes through all the possible states. Let us recall the basic results 
from [6] in the univariate case. Invertibility is a prerequisite of the single cycle 
property. If a T-function has a single cycle modulo 2 k then it has a single cycle 
modulo 2 fc_1 . If a T-function has a cycle of length l modulo 2 fe_1 then modulo 2 fc 
it has either a cycle of length 21 or two cycles of length l. Taking into account 
the fact that modulo 2 1 a function has either one cycle of length two or two 
cycles of length one we can conclude that the size of any cycle of a T-function 
is always a power of 2. 

In the univariate case a bit slice of an invertible T-function has the form 
= N i ® a ■ From (5) it follows that f(x) has one of the following 
forms: /i(x) = x © ri(x), /2M = x + 7-2(2:) or fz(x) = xrs(x), where the r, 
are parameters (in the case of multiplication additionally we need [7-3] 0 = 1). 
It is easy to see that [/3(x)] 0 = [x] 0 [t’ 3 (x)] 0 = [x] 0 , that is it has two cycles 
modulo 2 and so it can not form a single cycle modulo 2 n . So, a single cycle 
function has either 1 the first or the second form. In order to analyse the cycle 
structure of these forms the following definitions of even and odd parameters 

1 Note that there is no exclusive or here since every function can be represented in 

both forms, for example x + 1 = x©(x©(x + l)). 
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were introduced. Suppose that r(x) is a parameter, that is r(x) = r(x + 2 n - 1 ) 
(mod 2 n ). So, r(x) = r(x + 2" -1 ) + 2 n b(x) (mod 2 n+1 ). Consider 

2 n_1 — 1 2 n_1 -l 

B[r, n] = 2~ n ^ (r(i + 2 n ~ 1 ) — r(i)) (mod 2) = b(i). (7) 

i — 0 2—0 

The parameter is called even if B[r,n\ is always zero, and odd if B[r,n\ is 
always one 2 . 

Let us give several examples of even parameters: 

— r(x) = C , where C is an arbitrary constant (r(x) = r(x + 2 n_1 ) and so, 
6 = 0 and B = 0); 

— r(x) = 2x (r(x + 2 n_1 ) = r(x) + 2 n (mod 2 n+1 ), so 6(x) = 1 and B is even 
as long as 2 n ~ 1 is even, that is for n > 2); 

— r(x) = x 2 (r(x + 2 n ~ 1 ) = r(x) + 2 n x + 2 2 ( n ~ 1 \ so b(x) = [x] 0 and B is even 
for n > 3); 

— r(x) = 4 q(x), where q(x) is an arbitrary T-function (r(x + 2 n ~ 1 ) — r(x) = 
4( 5 (x + 2"- 1 ) -g(x)) =0 (mod 2”)); 

— r{x) = r'(x) - r"(x), where r' and r" are simultaneously even or odd param- 
eters (B = B' © B"). 

— r{x) = r'(x) V C, where C is an arbitrary constant and r'(x) is an even 
parameter (if [C] i = 0 then [r(x)] i = [r’ (x)] v and if [C\ i = 1 then [r(x)] i = 
[C\ v so in both cases [r(x)] i is the same as for some even parameter.) 

The following theorem was proved in [6]: 

Theorem 2. Let Nq be such that x — > x + r(x) mod 2 N ° defines a single cycle 
and for n > Nq the function r(x) is an even parameter. Then the mapping 
x — > x + r(x) mod 2" defines a single cycle for all n. 

We can use our running example of f(x) = x + (x 2 V C ) to demonstrate this 
theorem. If the binary form of C ends with . . . 1 Jl, then C = 5, 7 (mod 8), 

and x 2 V C = C (mod 8) is an odd constant modulo 2 3 so x + C has a single 
cycle modulo 2 3 . In addition, x 2 is an even parameter for n > 3, and this is 
not affected by ‘or’ing it with an arbitrary constant. In [6] it was shown that 
x — > x+ (x 2 V C) is the smallest nonlinear expression which defines a single cycle, 
in other words there is no nonlinear expression which defines a single cycle and 
consists of less than three operations. 

Another important class of single cycle mappings is f(x) = 1 + x + 4 g(x) 
for an arbitrary T-function g{x). It turns out that x86 microprocessors have an 
instruction which allows us to calculate 1 + x + 4y with a single instruction 3 and 

2 Note that in the general case B is a function of n, and thus the parameter can be 
neither even nor odd. We often relax these definitions by allowing exceptions for 
small n such as 1 or 2. 

3 The lea (load effective address) instruction makes it possible to calculate any expres- 
sion of the form C + Ri + kR. 2 , where C is a constant, Ri and i ?2 are registers and k 
is a power of two. Its original purpose was to simplify the calculation of addresses 
in arrays. 



